Creating a learner corpus infrastructure: Experiences from making learner corpora available
نویسندگان
چکیده
منابع مشابه
Building a learner corpus
The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked levels to cope with a wide range of error types present in the input. Each level corrects different types of errors; links between the levels allow capturing errors in word order and complex discontinuous expressions. Errors are not only corrected, bu...
متن کاملCreating a manually error-tagged and shallow-parsed learner corpus
The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowpar...
متن کاملSpoken English Learner Corpora
In this paper we present a survey of some most significant spoken English learner corpora created up to date. Spoken learner corpora which include speech generated by learners are important in many areas of research and practice, in particular, for identifying typical pronunciation errors of learners of English as a second language (ESL), English as a foreign language (EFL), or English as a lin...
متن کاملDependency Annotation for Learner Corpora
Building from the CHILDES dependency annotation scheme and on interlanguage POS annotation, we describe a syntactic annotation scheme developed for the data of second language learners. We encode subcategorization frames and underlying dependencies, in addition to the usual surface dependencies. The annotation scheme is relatively independent of language and can be mapped to learner errors.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ITM Web of Conferences
سال: 2020
ISSN: 2271-2097
DOI: 10.1051/itmconf/20203303006